AD-A185  841 


UNCLASSIFIED 


Cl  *SV  r'C  *  T|0N  0*  ▼  m  S  f*h»n  Data 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1  Hffon  nu“8£R  J  OOVT  ACCESSION  NO 

A.  I.  Memo  888 

»  recipient’s  Catalog  number 

*  TiTlE  <ond  Submit) 

Massively  Parallel  Implementations 
of  Theories  for  Apparent  Motion 

S  TYRE  OF  REPORT  «  PERIOD  COVEREO 

AI-Memo;  1987 

»  performing  org.  report  numier 

7.  auTmORU; 

Norberto  M.  Grzywacz  and  Alan  L.  Yuill 

•  CONTRACT  or  GRANT  NUMSER/ij 

e  N000U-85-K-0,'.4 

PERFORMING  organization  name  ano  AOORESS 

Artificial  Intelligence  Laboratory 

545  Technology  Square 

Cambridge,  MA  02139 

10.  PROGRAM  ELEMENT  PROJECT.  TASK 
AREA  *  WORK  UNIT  NUMBERS 

1  CONTROLLING  OFFICE  NAME  ANO  AOOMESS 

Advanced  Research  Projects  Agency 

1400  Wilson  Blvd. 

Arlington,  VA  22209 

1).  NUMBER  OF  PAGES 

38 

1*  MONITORING  AGENCY  NAME  t  ADORES  5(1/  dllltrm)  1  from  Conlltlllnl  Otllct) 

Office  of  Naval  Research 

Information  Systems 

IB.  SECURITY  CLASS  to*  this  ropoti) 

UNCLASSIFIED 

Arlington,  VA  22217 


Of  CL  ASS1  FI  CATION/ DOWNGRADING 
SCHEDULE 


17.  DISTRIBUTION  STATEMENT  (mi  tHo  mbttrotl  ontoto4  In  Blook  *0,  II  4llloront  tmm  Roporl) 


If.  SUPPLEMENT  ARY  NOTES 


«>D 


KlY  WOROS  (Contlnuo  on  rowotoo  ol4o  II  «itc«M«rf  mn4  Utility  fcf  klo cA  mwktr; 


Analog  networks 
Rigidity 


3-D  structure 
Vision 


20.  ABSTRACT  (Continue  on  rowotoo  ol4o  II  nooooomrf  mn4  l4onlltf  Of  block  mtmkor) 

We  investigate  two  ways  of  solving  the  correspondence  problem  for 
motion  using  the  assumptions  of  minimal  mapping  and  rigidity. 
Massively  parallel  analog 'networks  are  designed  to  implement  these 
theories.  Their  effectiveness  is  demonstrated  with  mathematical 
proofs  and  computer  simulations.  We  discuss  relevant  psychophysica 
experiments/ 


COITION  or  1  NOV  BS  IS  OBSOLCTC 
5/N  0:0 J-0I4- 6601  I 


UNCLASSIFIED 


ICCURITV  CLASSIFICATION  OF  THIS  R  AG  t  Dmtm  Bnioto* 


MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 
ARTIFICIAL  INTELLIGENCE  LABORATORY 

and 

CENTER  FOR  BIOLOGICAL  INFORMATION  PROCESSING 

WHITAKER  COLLEGE 


A. I.  Memo  No.  S8S  June  1987 

C’.B.I.P.  Memo  No.  016 


MASSIVELY  PARALLEL  IMPLEMENTATIONS 
OF  THEORIES  FOR  APPARENT  MOTION 


Norberto  M.  Grzywacz  and  Alan  L.  Yuille1 


ABSTRACT  We  investigate  two  ways  of  solving  the  correspondence  problem  for  mo¬ 
tion  using  the  assumptions  of  minimal  mapping  and  rigidity.  Massively  parallel  analog 
networks  are  designed  to  implement  these  theories.  Their  effectiveness  is  demonstrated 
with  mathematical  proofs  and  computer  simulations.  We  discuss  relevant  psychophysical 
experiments. 


©  Massachusetts  Institute  of  Technology  1987 


Acknowledgments.  This  report  describes  research  done  within  the  Artificial  Intelligence 
Laboratory  and  the  Center  for  Biological  Information  Processing  (Whitaker  College)  at 
the  Massachusetts  Institute  of  Technology.  Support  for  the  A. I.  Laboratory’s  artificial 
intelligence  research  is  provided  in  part  by  the  Advanced  Research  Projects  Agency  of 
the  Department  of  Defense  under  Office  of  Naval  Research  contract  N00014 -85-K-0JZ4. 
Support  for  this  research  is  also  provided  by  a  grant  from  the  Office  of  Naval  Research, 
Engineering  Psychology  Division. 

1  Current  address  is  Harvard  University  Department  of  Applied  Mathematics,  Gl2e  Pierce 
Hall.  Cambridge.  MA  02138. 


O 


1 


87 


2 


Table  of  Contents 


1  Introduction 

2  The  Minimal  Mapping  Theory  for  Apparent  Motion 

2.1  A  Network  Implementation 

2.1.1  Computer  Simulations 

3  Theoretical  results 

4  The  Structural  Theory  for  Apparent  Motion 

4.1  A  Network  Implementation 

4.2  Comparison  with  the  Minimal  Mapping  Theory 

5  Discussion 
References 


3 


1  Introduction 

One  of  the  most  important  roles  of  the  early  human  visual  system  is  the  extraction  of  the  three 
dimensional  (3-D)  structure  of  surfaces  (Marr,  1982).  It  has  been  proposed  that  the  system  deals 
with  this  task  through  different  modules,  each  analyzing  a  different  type  of  image  information. 
One  of  the  most  important  of  these  modules  is  the  one  that  recovers  the  3-D  shape  of  objects  from 
their  motion  cues.  Indeed  humans  are  capable  of  recovering  structure  from  motion,  under  both 
orthographic  and  perspective  projection,  and  in  the  absence  of  all  other  cues  to  3-D  structure  (for 
examples  of  the  early  work  see  YVallach  and  O’Connell,  1953;  Gibson  and  Gibson,  1957;  White  and 
Mueser,  I960;  Green,  19G1;  Braunstein,  1962;  Johansson,  1964;  for  a  review  of  the  psychophysical 
literature  see  Hildreth,  Inada,  Grzywacz  and  Adelson,  1987). 

The  problem  of  the  recovery  of  structure  from  motion  is  underconstrained  because  the 
image  information  available  in  the  retina  is  two-dimensional  (2-D),  and  therefore,  not  enough  to 
determine  the  3-D  shape  of  the  visual  world.  To  solve  this  problem,  Ullman  (1979)  proposed  that 
t  he  human  visual  system  uses  assumptions  about  the  world,  such  as  rigidity  of  objects,  to  constrain 
the  solution.  His  ideas  led  to  a  large  body  of  computational  work  testing  the  validity  of  different 
assumptions  directed  to  solve  the  structure  from  motion  problem  (for  examples  of  the  early  work 
see  Ullman,  1979;  Clocksin,  1980;  Prazdny,  1980;  Longuet-Higgins,  1981;  Longuet-Iliggins  and 
Prazdny,  1981;  Tsai  and  Huang,  1981;  for  a  review  of  the  computational  literature  see  Grzvwacz 
and  Hildreth,  1987). 

Ullman  used  psychophysical  data  to  argue  that  the  process  is  divided  into  two  stages. 
The  first  is  solving  the  so-called  correspondence  problem,  which  consists  of  matching  tokens,  such 
as  points  or  straight  lines,  between  different  image  frames  (see  explanation  below).  He  suggested 
that  once  this  matching  is  done  the  second  stage  assumes  rigidity  of  the  object’s  structure  in  order 
to  recover  its  3-D  shape.  (Later,  Ullman  relaxed  the  assumption  of  rigidity  in  favor  of  a  scheme  in 
which  the  transformations  of  structure  from  frame  to  frame  would  be  as  rigid  as  possible,  although 
not  strictly  rigid;  Ullman,  1981.) 

It  is  not  necessary  to  postulate  a  solution  of  the  structure  from  motion  problem  in 
terms  of  isolated  features.  In  fact,  optical  flow  approaches  to  the  problem  have  been  suggested 
(e.g.  Prazdny,  1980;  Longuet  Higgins  and  Prazdny,  1981;  Hoffman,  1982;  Waxman  and  Ullman. 
1985).  There  are  reasons,  however,  to  consider  feat ure- based  schemes.  The  main  reason  is  that 
the  optical  flow  field  (a  2-D  field  that  can  be  associated  with  the  variation  of  the  image  brightness 
pattern)  and  the  2  1)  motion  field  (the  projection  on  the  image  plane  of  the  3  I)  velocity  field  of  a 
moving  scene),  randy  coincide.  For  some  analytic  models  of  surface  reflectance  this  can  be  proven 
(Yerri  and  Poggio,  1986).  The  problem  stems  from  the  fact  that  image  brightness  patterns  and 
their  changes  do  not  correspond  directly  to  physical  entities  and  their  motion  (Ullman,  1979).  Not 
surprisingly,  however,  it  turns  out  from  Verri  and  Poggio’s  work,  that  the  optical  flow  and  motion 
field  nearly  coincide  at  brightness  edges  and  thus  at  the  most  elementary  type  of  features. 

Another  reason  to  consider  the  feature  based  schemes  is  that  a  reliable  recovery  of 
structure  from  motion  seems  to  require,  a  simultaneous  inspection  of  image  frames  that  have  large 
separations  in  time  (Wallach  and  O'Connell,  1953;  White  and  Mueser,  1960;  Green.  1961:  Braun 
stein  and  \ndersen,  1984;  Doner,  Lappin  and  Perfetto,  1981;  Andersen  and  Siegel,  1986;  Braun 
stein.  Hoffman.  Shapiro.  Andersen  and  Bennett.  1986;  Hildreth  et  a)  ,  1987,  Grzvwacz,  Hildreth, 
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Inada  and  Adolson,  1987).  'I'his  requirement  brings  back  the  correspondent <•  pioblem  in- u. United 
above.  In  simple  words,  this  is  the  problem  of  matching  parts  in  different  image  frames  such  that 
matched  primitives  correspond  to  the  same  features  in  the  viewed  object. 

The  human  visual  system  is  able  to  solve  the  correspondence  problem  even  win  n  the 
motion  is  presented  indiscrete  frames  which  have  large  separations  in  time.  This  is  the  phenomenon 
of  long-range  apparent  motion.  (Two  distinct  processes  for  the  measurement  of  mot  ion  seem  io  e>  i  ,t 
in  the  human  visual  system  ( Brad  dick,  197-1,  1980),  one  dealing  vc  ii  !i  large  sepa  rat  ions  in  space  am! 
time,  the  long-range  motion  process,  and  the  other  dealing  with  small  separations,  the  short  range 
motion  process.)  Apparent  motion  has  been  studied  extensively  in-llie  psychophysical  literature 
(see,  for  example,  Wertheimer,  1912;  Korte,  1915:  Kolers.  I9c"2,  Atlneaw.  , : 1 7 - 1 ,  Braddick.  1980; 
Ullman,  1979:  Anstis,  1980;  Green,  1983,  1980;  Mutch,  Smith  and  Yon  as,  198.3:  Ramac  handrail 
and  Anstis,  1983,  a,b,c,  1985;  Anstis  and  Mather.  1985;  Mather.  Cavanagh  and  Anstis.  I ‘.185; 
Ramachandran,  1985;  Anstis  arid  Ramachand ran,  1980;  Green  and  Odom.  1980;  von  Grunau. 
1980;  Grzywacz,  1980,  1987;  Prazdny,  1986;  Ramachand  ran.  Inada  and  Kiama,  1980;  Watson, 
1986;  Finlay  and  Dodwell,  1987). 

Ullman  (1979)  proposed  a  computational  theory  for  apparent  motion,  which  he  called 
the  Minimal  Mapping  Theory.  Minimal  mapping  is  the  process  by  which  features  in  a  given  frame 
are  matched  to  features  in  another  frame  such  that  the  sum  of  the  distances  traveled  is  minimal 
( For  psychophysical  evidence  supporting  minimal  mapping  as  an  important  factor  in  apparent  mo¬ 
tion  see  Ullman.  1979;  Williams  and  Sekuler,  1981;  Green  and  Odom.  1980.)  This  theory  proposes, 
t  herefore,  to  solve  the  correspondence  problem  through  the  minimization  of  a  cost  function,  f  llmv 
ever,  note  that  strictly  speaking  UUman’s  theory  does  not  require  the  minimization  of  the  sum  o' 
Fuciidian  distances,  hut  it  allows  for  most  abstract  distances  such  as  difference  of  oriental  ion  oi 
brightness  of  the  features.  In  this  paper  we  consider  only  the  Fuciidian  version  of  the  theory.) 

Finding  the  correct  cost  function,  however,  is  only  half  t  lie  problem.  We  need  a  fast  and 
reliable  method  of  minimizing  it.  If  the  cost  function  is  convex  there  exist  inanv  f.ist  and  reliable 
methods  for  finding  the  global  minimum.  For  tion-convex  cost  functions  -lovha.stic  idaxa!  i.  m 
strategies  like  the  Metropolis  (Metropolis,  Rosenbluth.  Rosenbluth.  Teller  and  Teller.  1953)  or 
the  'imulated  annealing  algorithms  (Kirkpatrick,  Gelatt  and  Ye. chi,  1983)  will  generally  liiui  tin- 
global  minimum,  but  reportedly  take  a  long  time  to  do  so.  (For  examples  of  tin  n-c  of  stochastic 
relaxation  methods  in  computational  vision  see.  Ballard,  Hinton  and  Sejnow.-ki.  I'ls.R  Hinton  and 
Sejnowski.  1983;  Genian  and  Genian.  198-1;  Marroquin,  198-1;  Ibvko  and  Srhnlien,  l‘)s Kienker. 
Sejnowski.  Hinton  and  Schumacher.  1980;  O’Toole  and  Kersten.  1980;  Sereno.  1980.)  Filmai, 

(  1979 1  u-ed  a  lineal  progiainming  method  to  solve  (lie  correspondence  pr.  .b|. -in.  and  although 
this  always  converged  correctly  it  did  so  very  slowly  (Fllman,  pers  comni.t  li,-tead  of  a  sluw 
a  dol'd  Inn  that  always  converges  to  the  right  answer  it  may  oft  eti  l.e  a  bet  ter  st  i  a  t  re  v  t . .  u-e  a  I,.  ■  t 
algorithm  that  con  verges  to  almost  t  he  right  answer  most  of  the  time  I  hi-  sugg.c  -  ini|-le'm  iilinc 
the  pi  obleni  in  mmiiis  of  determ  in  ist  ic  analog  net  works  wii  h  pa  rail*  I  a  i  <  I,  ii .  <  mu-  ,  f<  •[  r . .  j .  !•  •  . 

t  -ii  deterministic  analog  networks  in  computational  vision.  .  A.  ...i  .  I).-,,  r.i, 

Marram!  Hoggin.  1970;  Fllman.  1979;  Feldman  and  Ballard.  I9VJ:  i’nggin.  |..ir>  an.l  Koch.  |9sY 
I  ikn-hima.  1980:  Grzywacz  and  Yuille.  1980;  II  n  t  <  hinson  and  Ko.  h  I9m>:  K ...  :  '.|.(i  loqiiin  and 

mile.  1  980;  Rummelhait.  Hinton.  Williams.  198(1;  l  ittle.  Hull  h<  >tf  and  I'o I'e  ;, 

An  important  example  of  nonlinear  analog  net  \v..i  k-  si  tidied  mi  l  Io  it- 
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1  .  i.  '  iii  5 • ; 1 1 1 »  mi!  of  resistors.  capacitors  and  inductances.  and  whoso 
■i  n' 't.<t  tlii'Hinli  <li  \io<‘s  that  implement  a  static  nonlinearity.  If  this  non- 
i  ii.p'ii  *  mi  i  pui  relationship,  similar  to  those  implemented  by  synapses,  then 


I  i"ln  W  III  MM  i '  1  *  l||i  ' 

I  ‘  1 1  ‘  1 1 1 ' '  I!  I  .1  I  \  II  1 1 1  I  .1  I 

lineari I  v  i-  a  -le  m.  > 

i  lies''  net  wink"  an  '  all"!  inuial  networks"  (llopfield,  19K2.  1984;  Hopfield  and  Tank,  1985)  since 
its  unit-  in."  In  ii  t'ar.l'  d  a-  - 1 1 1 1 j d i ! i * 'd  models  of  neurons.  We  emphasize,  however,  that  real  neu¬ 
rons  are  i  omple\  1 1  mi  '  put. 1 1  n  Mia  I  dniiis  i  \  on  Neumann.  1958;  Koch.  I’oggio  and  Torre,  1982;  Crill 
i  'm  1  i  a i i i i I i  l'is.i  hutllei.  Xu  hols  and  Martin,  19X4)  and  that  the  name  “neural  network”  is 

U  ei  |  here  oil !  \  as  a  lli'  l  a  pli'  i 

( 'u r re nt l  \ .  resea n  h  is  being  done  to  construct  electronical  devices  that  implement  such 
networks.  If  built,  they  will  perform  calculations  extremely  fast  because  of  their  parallel,  analog 
nature,  llopfield  and  lank  (  19x5)  have  shown  that  these  networks  are  capable  of  calculating  good 
approximate  solutions  to  complex  minimization  problems,  such  as  the  Traveling  Salesman  Problem. 
Koch,  Marroquin  and  Yuille  ( 19K(>)  successfully  applied  them  to  the  surface  interpolation  problem 
of  early  vision. 


The  present  paper  proposes  and  studies  massively  “neural  network”  implementations 
designed  to  solve  the  correspondence  problem  in  apparent  motion  (where  “massively”  means  that 
every  two  elementary  units  are  interconnected). 

In  Section  2  we  describe  a  “neural-network”  implementation  of  a  version  of  the  Minimal 
Mapping  Theory.  In  the  same  section  we  give  examples  of  computer  simulations  of  this  implemen¬ 
tation,  and  show  that  it  accounts  for  the  basic  psychophysical  apparent  motion  phenomenology. 
This  section  also  presents  a  demonstration  of  the  speed  of  the  “neural-network”  implementation 
and  of  the  fact  that  even  for  very  complex,  nonrigid  motion,  a  nearly  optimal  solution  is  obtained. 
In  Section  3  we  prove  theorems  about  the  convergence  of  the  network  and  show  that  for  some 
situations  the  system  will  always  find  the  correct  solution.  In  the  same  section  we  will  discuss  how 
we  chose  the  network  parameters  for  our  computer  simulations. 

Section  4  is  directed  to  another  question.  It  is  natural  to  ask  whether  errors  are  caused 
by  dividing  the  structure  from  motion  process  into  two  stages;  first  solving  the  correspondence 
problem  and  then  using  the  correspondence  information  to  recover  the  3-D  shape  of  objects.  Both 
processes  are  solved  using  different  assumptions  and  it  is  possible  that  these  conflict  for  some 
stimuli.  In  this  section  we  use  the  same  mathematical  formalism  used  in  the  preceding  sections 
to  determine  whether  rigidity  alone  (the  basic  assumption  used  to  recover  the  3-D  structure  from 
motion)  is  sufficient  to  solve  the  correspondence  problem  (and  simultaneously  the  structure  from 
motion  problem)  We  show  that  further  constraints  are  usually  needed  toobtain  thecorrect  answers. 
This  result  gives  a  computational  argument  in  favor  of  a  division  of  the  structure  from  motion 
process  in  the  above  two  stages.  We  will  also  discuss  a  theory  that  combines  the  minimal  mapping 
and  rigidity  assumptions  and  is  able  to  solve  the  correspondence  and  the  structure  from  motion 


problems  simultaneously. 


2  The  Minimal  Mapping  Theory  for  Apparent  Motion 


I  hi-  -ect ion  will  begin  with  a  formal  introduction  to  the  Minimal  Mapping  Theory  and  propose 
a  "neural  network"  implementation  of  this  theory  (Section  2.1).  We  then  proceed  to  demonstrate 
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that  this  implementation  simulates  the  basic  apparent  motion  psychophysical  phenomenology  (So< 
tion  2.1.1),  i.e.  ambiguous  and  unambiguous  2-D  motions,  wagon  wheel  type  illusions,  and  trails 
parent  and  opaque  3-D  motions.  We  also  analyze  the  convergence  time  of  the  net  woi  k  in  com  pa  ns.  ,■ 
with  the  time  constant  of  its  basic  units  and  discuss  the  quality  of  tiie  solutions  obtained.  I  h<  > 
solutions  are  not  strictly  correct  since  the  minimization  procedure  may  become  trapped  in  bx  ,d 
minima.  We  show,  however,  that  those  solutions  are  near  optimal.  Our  main  result  in  this  section 
is  this:  provided  that  the  motion  is  sufficiently  small,  network  parameters  can  be  chosen  suc  h  that 
convergence  to  the  optimal  solution  is  guaranteed. 


2.1  A  Network  Implementation 

In  the  Minimal  Mapping  Theory  (Ullman,  1979),  the  image  of  an  object  with  A'  features  is  described 
by  the  2-D  coordinates  of  point  on  the  object,  (*,(<),  i  =  I,  •••.A7.  I.et  images  be  given 

at  two  instants,  t  —  6t  and  1,  and  let  us  begin  by  assuming  that  the  number  of  fea'ures  in  the 
two  instants  are  identical.  We  now  define  a  set  of  binary  correspondence  variables  l  ,,,  such  that  if 
feature  i  in  the  first  frame  maps  to  feature  a  in  the  second  frame  then  Vta  -  1,  otherwise  l)rI  -  (I. 
From  the  assumptions  of  the  Minimal  Mapping  Theory  we  want,  to  define  a  matching  cost  function. 
Emm,  which  is  minimized  only  when  the  total  distance  traveled  by  the  features  is  minimal.  We 
follow  Ullman  and  let: 


Emm  =  '  W'. 


(2.1  i 


where. 

dia  =  ((*<.(0  -  J-,(t  ~  fit))2  +  (//„(/)  -  >J,  (t  ~  fit'lf)  ■  (2  2) 

lo  find  the  correspondence,  the  Minimal  Mapping  Theory  proposes  to  minimize  Emm  with  respect 
to  I)*,  requiring  a  bijective  mapping,  i.e.  that  all  teat u res  in  the  first  frame  are  mat<  lied  exactly  to 
one  feature  in  the  second  frame. 

In  order  to  perform  a  fast  minimization  we  adapt  in  this  paper  a  '  neural  network" 
method  proposed  by  Hopfield  and  I  ank  (1 9N5).  Consider  a  system  with  V  non  i  a  I  like  clone  nit  a  t  v 
units  symmetrically  connected  to  each  other.  Far h  unit  .-.ill  represent  a  possible  correspondence 
between  feature  i  at  instant  t  -  A/  and  feature  a  at  instant  I. 

We  first  define  a  new  array  of  variables,  which  will  reptosent  the  internal  voltage 

of  the  “neural"  units.  These  are  internal  variables  of  the  new  problem  and  have  a  mount onicallv 
increasing  relationship  to  Vj ,  (which  will  represent  the  output  of  these  units): 


U,„  = 


1 


'  "  ~  2 A  l"'/] 


2  \r„, 
l , , 


l,, 


where  A  is  a  positive  parameter  of  the  problem  Although 


!2.:ti 


(2.1  i 


E,, 


one  ea  n  see  from  I  ip 
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whore  A,B,C  and  r  are  positive  parameters  of  the  problem.  (We  will  informally  identify  each 
of  the  terms  of  the  right  hand  side  of  Eq.  2.5  by  the  parameter  leading  it.)  Minimization  of  the 
first  component  of  the  A  term  forces  each  feature  in  the  second  frame  to  maintain  correspondence 
with  as  few  features  as  possible  in  the  first  frame,  (and  vice  versa  for  the  second  component). 
Minimization  of  the  II  term  tends  to  force  the  total  number  of  correspondences  to  be  N.  Thus  the 
terms  /I  and  B  together  will  tend  to  force  a  one-to-one  correspondence  between  features  in  the 
two  frames.  The  r  term  is  necessary  to  give  a  time  constant  for  convergence  of  the  network,  as  will 
be  seen  below.  Finally,  the  parameter  ('  serves  to  provide  scaling  for  the  physical  dimensions,  i.e. 
if  the  image  of  a  given  object  is  just  an  expansion  of  the  image  of  another,  then  the  network  will 
obtain  the  same  solution  for  the  two  objects,  provided  that  C  is  scaled  properly. 

Perceptually,  if  the  two  image  frames  have  a  different  number  of  features,  say  N\  and 
A,.  usually  splitting  and  fusion  will  take  place,  such  that  no  feature  will  be  left  alone.  It  is  easy 
to  incorporate  this  effect  into  the  energy  function  by  substituting  N  in  the  B  term  of  Eq.  2.5  by 
.V„l0,  =  max(Aj,  ,V2).  This  was  done  for  a  few  of  our  computer  simulations. 

Observe  that  if  the  V,a  variables  are  updated  according  to  the  differential  equations: 


dllia  OK 
dt  ~  i)Via ' 


1  <  i  <  N,  ]  <  a  <  N, 


(2-6) 


then  the  system  will  stop  in  a  point  of  the  solution  space  in  which  the  function  E  is  at  one  of  its 
minima.  To  see  this,  observe  that  because  of  the  monotonicity  between  U{a  and  Vj0  expressed  in 
Eq.  2..1.  the  update  rule,  Eq.  2.6,  will  tend  to  force  Vja  to  desrend  down  the  gradient  of  E.  Note 
that  if  A  is  large  enough  the  variables  Vta  will  tend  to  be  either  0  or  1  and  thus,  in  spite  of  the  fact 
that  the  search  process  is  in  a  continuous  space,  it  will  tend  to  force  a  binary  decision  to  determine 
whether  a  correspondence  is  to  be  established  or  not.  In  fact  using  the  chain  rule  for  differentiation 
and  Eq.  2.6  we  find 


dE  ^  0Via  OE  BE 

~iT  ~  ~  4-  ou,a  ovindv~' 

i  a 


hr  out  Eq.  2.1  we  calculate 


2  A 


2> 
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Therefore  dE/dt  <  0,  which  together  with  the  fact  that  E  >  0  proves  that  the  system  will  teach 
equilibrium,  and  in  that  situation  E  will  be  at  a  minimum.  Technically  this  means  that  is  a 
Liapunov  function  of  the  system  (see  also  Hopfield,  1984). 

The  solution  of  Eq.  2.6  can  be  implemented  by  a  “neural-network".  To  calculate 
the  symmetric  connection  strength,  Tiajb,  between  unit  i  a  and  unit  j  b.  and  the  external  input 
currents,  lta  (data),  we  substitute  Eq.  2.5  into  Eq.  2.6: 

^  =  -A(\\col  +  \’iuW  -  2  Vi  a )  +  B(S  -  \)  -  Cdia  -  — .  (.2.9) 

dt  r 

Here  we  have  introduced  a  new  notation.  V'  =  XXa  V',,,.  \  ’/  °L  —  XX  1  and  l  ]<ovv  =  X!,  T,,. 

Equation  2.9  is  the  equation  of  motion  of  the  system  and  was  what  we  simulated  in  the  computer. 

Note  that  the  time  constant  is  r.  That  implies  that  the  internal  resistivity  ami  capacitance  of  the 

network  units  can  be  set  constant,  equal  to  each  other  and  independent  of  the  problem  to  be  solved. 

Tiajb  is  the  contribution  to  the  rate  of  change  of  (:la  (the  voltage  of  unit  i  a)  by  Ip. 

(the  output  of  unit  j  b)  and  can  therefore  be  readily  obtained  from  Eq.  2.9: 

Tni.bj  =  ~A(6ab(  l-M  +  Ml  -Kb))~  H  (2.10) 

Similarly  Iia  is  the  contribution  to  the  rate  of  change  of  which  is  independent  of  the  slate  ol 
other  units: 

iia  =  ns  -  cdtn.  (2.11) 

The  A  term  in  Eq.  2.10  represents  inhibitory  connections  within  each  row  and  each  column  off/',.,  J 
Hie  B  term  in  Eq.  2.10  represents  a  global  inhibition  between  every  pair  of  units.  Therefore,  every 
two  units  are  mutually  connected,  with  a  total  of  A  4  -  S2  connections. 

The  B  term  on  Eq.  2.11  is  the  excitation  bias  and  is  equally  applied  to  every  unit.  Tim 
f  term  iri  Eq.  2.11  is  t he  inhibitory  current  through  which  the  data  i-  provided  to  t he  system.  The 
larger  the  dia,  the  more  a  feature  would  have  to  travel  between  place  i  in  tlm  lust  frame  to  place 
a  in  ihe  second  frame,  and  the  less  favorable  this  connection  should  be.  then  hue  more  inhibition 
is  applied  to  the  corresponding  “neural  unit”. 

It  is  important  to  note  that  in  contrast  with  lloplidd  and  I. ink's  method  for  tlm 
I  raveling  salesman  problem  (Hopfield  and  Tank.  1  f ) v  -A ) .  thedat:,  enlei-  into  our  system  us  applied 
currents  and  not  as  modifications  of  t  lie  Conner' ivil  ies  between  unit--. 

In  the  next  section  we  present  t  lie  results  of  our  computet  mini.,'  the.-  bv  t  he  numerical 
integral  ion  of  Eq.  2.9. 

2.1.1  Computer  Simulations 

We  simulated  this  network  on  a  Symbolics  .160(1  LIST  m.n  hue-  In  out  . .  we  d-d  not  1 1  % 

In  optimize  the  parameters  .■ 4.  B,  ( r  and  A  in  any  sense.  (Although  foi  t  lm  simulations  report)  d 
in  this  paper,  we  took  into  account  the  rules  discussed  in  Seitjon  .'t.  -  It, -lead  v.  ■  found  that  lie 
as  ym  plot  ic  behavior  of  the  system  was  t  he  same  for  a  la  rgc  raiire  of  p.,  ra  met  o  \ .,  hm-  (few  otdei  - 
of  magnitude),  and  that  a  given  set  of  parameters  would  give  <  o|,e<  t  -i  m  ula  i  in:.-  to  problem-  it  I 


a  tlifforont  number  of  features.  For  all  the  simulations  reported  in  tliis  paper  (u  niece  reported 
otherwise)  wo  used  A  —  l()-\//  =  104,C  =  1  ,r  =  1  and  A  =  1,  and  tho  maximal  distanco 
between  foatures  in  an  objoct  was  always  1.  Finally,  wo  usod  homogeneous  initial  conditions  for 
our  simulations,  i.e. : 


ViaU  =  0)  =  -■ 


(2.11) 


Tho  first  simulations  showod  that  tho  network  can  correctly  replicate  apparent  motion  percepts. 
Figure  1  illustrates  the  matching  predicted  for  a  10  feature  object  rotating  by  10°.  (Our  simulations 
extended  to  objects  containing  up  to  20  features.)  In  Fig.  lb  the  same  object  translates  slightly. 
In  our  figures  the  features  in  the  first  frame  are  always  represented  by  squares  and  those  in  the 
second  frame  by  triangles.  The  labels  for  the  features  are  maintained  after  the  motion,  so  that  the 
expected  values  for  the  [l  jn]  matrix  at  equilibrium  should  be  close  t<>  1  at  the  diagonal,  and  close 
to  0  off  diagonal.  The  temporal  evolution  for  this  matrix  in  the  rotation  rase  of  h  ig.  1  is  shown 
in  a  d-D  plot  in  Fig.  2.  (A  similar  temporal  evob.'ion  was  obtained  for  the  translation.)  The 
solid  lines  in  Fig.  1.  and  in  similar  figures  afterwards,  indicate  the  established  correspondences, 
i.e.  the  maxima  of  the  [lj„]  arrays.  The  durations  of  network  computation  for  this  figure  were 
O.Ofir  and  0.0-IAr  for  the  rotation  and  translation  respectively.  (We  point  out  that  the  dependence 
on  tti(>  complexity  of  the  problem,  of  the  convergence  time  of  the  simulated  parallel  network,  is 
different  than  that  of  the  CPU  time  of  the  computers  in  which  the  simulation  was  performed.  This 
is  because  these  computers  were  serial.  Thus  the  CPU  times  were  irrelevant  for  our  conclusions 
and  were  not  monitored.) 


figure  I.  I  lie  network  matching  predictions  for  a  moving  object  of  III  features  1  he  positions  of  the 
features  are  represented  in  the  first  frame  by  squares  and  m  the  seeond  b\  triangles  A  specific  feature  is 
indicated  by  I  he  same  index  in  t  lie  l  wo  frames  and  the  solid  hues  mdi<  ate  the  correspondences  established 
by  the  network  a  The  object  is  rotated  by  1(1  around  tie-  optu  a\i-  I  lie  t  indicates  t  lie  center  of  the 
rot  at  ion  b  The  object  is  translated  to  the  radii  I  lo-  ■  orre<  i  .  oi  respond. -m  es  were  established  in  hot  h 
rii  -is  I  hey  are  e  \ pert ed  I o  be  correct  w  hen  l  In  d e  pl.o  <  men '  I >■  i  « • .  I rn  me-  is  small 


Note  that  t  tie  correct  correspondence  was  obtained,  i.e.  the  diagonal  of  the  a  era  v  [l  , rl ]  was  preferied 
(Fig.  2 ).  Ineorrert  matches  were  suppressed  to  several  orders  of  magnitude  below  the  correct  ones 
In  the  rotation  case,  even  for  feature  number  10,  which  by  simple  pioximity  would  prefer  to  mao  ; 
features  1,2  or  1  (Fig,  1),  the  global  consensus  held  and  the  correct  correspondence  wa.-  made. 

Note  in  Fig.  2,  that  at  t  =  0  the  array  is  flat,  which  indicates  the  lack  of  preference  |,,i 
any  particular  correspondence.  Afterwards,  a  competition  between  t he  correspondences  is  initia'ed 
until  the  diagonal  is  preferred  (/  =  O.OO.'lTbr, O.OOTbr. ().()]. dr).  Only  alter  this  diagonal  is  cho-t  n 
the  last  false  matches  are  eliminate  d  (I  =  (hO.'lr,  /  =  O.Otir). 


I  iv  u  o  J  A  .'I  dimensional  plot  of  the  time  evolution  ot  ihe  correspondent  e  arras  nr  the  ro’aii  n  cast 
"I  I  *U  I  In  the  six  graphs  the  \  axi-  represent-  the  value  of  1 1 , ,  an.e  ‘  range,).-  lo>m  tl  to  1).  and 
•he  i  and  u  axis  represent  the  features  indices  in  l  he  first  and  set  tun!  frames  r—jiri  1 1\ eiv  The  tune-  ot 
t  i  unpin  at  ion  for  the  a  rra  vs  are  displayed  on  t  lie  upper  right  turner  ot  eat  It  graph  !  he  <  •  u  i  et  t  ot  I  ues-  . ,[ 
the  correspondence  fount)  in  f  ig  i  is  illustrated  here  by  i  he  ft  divergence  of  lie  a  tin  i  o  a  diagonal  torn. 


\nothei  rt  suit  of  interest  in  I  ig-.  I  and  2  is  that  tin  litm  -  of  t  ohm  i  g,  m  <•  w,  |,. -Inn  ter  1  linn  > 
time  constant  of  the  elementary  units  of  the  network  (•  (I  Of',-  and  •  O.Ohr  ft t  hi'  r<<i  Jmn 
the  1 1  a  nsla  1  ion  re-pec  t  ivel  v  ).  |  n  Set  t  ion  11.  wo  prove  that  even  in  eq  n  i  h  b :  i  ntn  tin  ,.t' : > '  >! .  -  1 

d  if  I'M  <  n  t  front  0  or  1 .  a  1 1  Inui  g  (,  t  Imv  t  a  n  a  pproat  It  t  lu-t  v . .  .till  .  i  j  ’  ,  .  ,  ]  i  |o! ,v  -  t : 

lot  prat  in  ml  purposes  a  criterion  threshold  ha-  to  be  arbil  taiilv  ,  t,,  dcliti.  .  onveieence.  he  1 
h  for  example.  Wt’  set  I  hi-  threshold  at  l  ,.,  v  ().().'•  ot  l,,  •  0  I  ■  /.,/  \  |  hat  I-.  a1’ 

1  ti.ttbr  in  the  rot  a  t  it  ui  t  a-e  a  mi  I  0.(1  |."r  in  the  1 1  n  n-!a  t  mn  case,  all  i  h  a  1 1  :■  v  e|euu"i'  -  v  . 

'  '  e  I  III  '  *  i\v  0,1 1 '  o|  above  II.M.I.  j  I  ||  |S  (  I  i  t  I'l  ion  vv  a-  I|-ei|  l<  I|  .  di  lie  ■ ;  ,  r .  t pi  ;  i  I  I,,  t,. 
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1 1 

t  tint  the  convergence  of  tlm  syslem  was  faster  titan  tlie  time  eonstant  of  tie-  e|<  nmntary  units  means 
that  the  variables  \'ia  can  pass  th('  threshold  criterion  very  fast,  although  technically  they  will  reach 
equilibrium  only  after  a  time  constant  or  so  had  elapsed.  At  any  rate,  the  time  of  convergence  of  the 
system  is  limited  only  by  t,  and  can  be  very  short.  In  all  the  figures  in  this  paper  the  convergence 
time  was  much  shorter  than  r. 

The  example  of  Fig.  I  is  such  that  the  extent  of  motion  is  small.  In  Section  3.  we 
prove  a  theorem  which  states  that  for  short  motions  a  choice  of  parameters  ran  be  made  such  that 
a  convergence  to  the  correct  solution  is  guaranteed.  The  result  in  Fig.  1  confirms  this  theorem. 

Not  only  for  small  motions,  however,  does  the  network  simulate  psychophysical  per¬ 
cepts.  In  the  case  of  large  rotations,  for  example,  perceptual  illusions  often  occur.  This  is  because 
in  these  situations,  features  can  travel  large  distances,  and  may  approach  positions  in  tlm  second 
frame  that  originally  were  occupied  by  other  features.  Such  an  example  is  the  wagon  -wheel  illusion, 
a  well  known  motion  picture  effect,  in  which  a  spoked  wagon  wheel  seems  to  rotate  in  tho  toroctiou 
opposite  to  its  real  sense  of  rotation.  This  illusion  is  also  obtained  by  the  network,  and  is  illust rated 
in  Fig.  3.  In  this  example,  eight  features  disposed  in  the  corners  of  a  perfect  octagon  rotate  1 1  °  1 5' 
in  one  case  (Fig.  3a)  and  33°45'  in  another  case  (Fig.  3.b).  The  3  0  plot  of  the  matrices  [V’,„]  at 
the  convergence'  time  are  shown  in  Fig.  3c  and  3d  for  Figs.  3a  and  3b  respectively.  The  convergence 
time  for  this  figure  was  ().()2r. 

d'he  wagon  wheel  illusion  is  established  by  the  incorrect  correspondences  that  happen 
in  the  large  rotation.  (Instead  of  the  diagonal,  a  rotation  permutation  of  the  array  [V,,,]  was 
selected.'  Once  again,  the  incorrect  matches  were  suppressed  by  many  orders  of  magnitude. 

I  he  network  can  also  deal  in  a  psychophysicallv  appropriate  way  with  ambiguous 
situations,  i.e.  cases  of  perceptual  metastability.  An  example  of  such  a  situation  is  shown  in  Fig. 

1  and  has  been  studied  extensively  in  the  psychophysical  literature  (Von  Schiller.  1933:  Gengerelli. 
194X;  Ramachandran  and  Antis.  19X3,  a.b.c:  19X5).  It  consists  rtf  two  features  disposed  at  the 
end  of  an  imaginary  rigid  rod.  Tim  rod  rotates  at  each  new  frame  by  90°  around  its  center,  d  im 
features  in  t he  second  frame  are  equidistant  to  each  one  of  the  features  in  the  first  frame.  It  follows 
that  a  given  feature  in  the  first  frame  is  equally  likely  to  match  both  features  in  the  second  frame, 
thus  giving  rise  to  a  metastable  situation.  The  numerical  values  in  the  matrix  [l,,,]  at  the  time  of 
convergence  are  given  in  the  figure.  I  lie  time  of  convergence  was  1.(1  <  10-' r.  and  the  array  did 
not  change  even  after  10i. 

The  metastability  of  the  motion  display  is  expressed  in  tim  frac  tional  results  computed 
by  the  net  work.  1  his  is  possible,  beca  use  t  he  variables  are  not  binary  i  I  .<).  2.3|.  alt  hough  often  tend 
to  0  or  I  at  equilibrium.  The  interpretation  of  these  fractional  results  should  be  in  probabilistic 
terms:  i.e.  a  given  feature  in  the  first  frame  has  a  probability  close  to  0.5  of  matching  a  given 
feature  in  the  second  frame.  Indeed,  when  noise  intervenes  in  the  data  to  the  network,  i.e  when 

•  here  is  a  ra  ndom  modulation  of  the  distance  between  the  feat  arcs,  the  sestet . .  longer  <  on  verges 

to  0.5.  but  rather,  a  one  to  one  matching  choice  is  made  by  the  network.  Finally,  we  point  out 
that  I  lie  sum  of  the  matching  probabilities  for  a  feature  reported  by  tlm  network  i-  less  than  1. 
since  all  the  l,,  —  0.  1975  <  0.5.  This  result  is  not  a  numerical  artifact,  as  in  Section  3  w  ■  prove 
an.ilv  ticallv  that  F  <  .V  (where  1  was  defined  in  F.q.  2.9).  We  aha  prove  in  tlm  sanm  -rciion, 
however,  that  a  choice' of  network  parameters  can  be  made  such  that  1  is  arbitraiilv  close  to  \ 


Figure  3.  The  wagon-wheel  illusion.  The  symbols  in  r  igs.  a  and  b  are  the  same  as  in  r  ig  1,  and  the  axes 
of  Figs,  c  and  d  are  the  same  as  in  Fig.  2.  An  object  whose  features  lie  on  the  corners  of  a  perfect  octagon 
is  rotated  around  the  optic  axis.  The  rotations  were  1 1  °  15#  and  33045'  in  Figs,  a  and  b.  respectively.  The 
established  correspondence  was  correct  for  the  small  rotation  but  incorrect  for  the  large  one;  the  reported 
direction  of  rotation  was  reversed  as  is  the  case  for  humans.  Figures  c  and  d  show  the  correspondence  array 
at  the  time  of  convergence  for  the  small  and  large  rotations  respectively.  The  illusion  corresponds  to  the 
network  converging  to  a  diagonal  form  in  the  first  case,  but  to  a  non-diagonal  form  in  the  second  case. 


(In  humans,  if  the  visual  display  of  Fig.  1  is  presented  repeatedly,  t  tie  percept  is  either 
of  oscillation  or  rotation  depending  on  the  temporal  parameters  of  the  stimulus  ( Kamar handrail 
and  Antis,  1983,  a,b,c;  1985).  However,  the  percept  predicted  by  the  Minimal  Mapping  theory,  and 
thus  by  our  network,  is  random  from  presentation  to  presentation.  In  fact,  it  can  be  shown  that 
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Figure  -F  An  ambiguous  situation.  The  symbols  are  the  same  as  in  Fig.  1.  The  two  features  of  the  first 
frame  are  equally  likely  to  match  either  feature  of  the  second  frame.  The  network  deals  with  this  problem 
by  converging  to  values  that  are  neither  0  nor  1.  For  all  of  the  previous  examples  the  networks  converged 
to  binary  values.  The  matrix  shown  in  the  figure  is  the  final  value  reached  by  the  correspondence  array.  Its 
values  are  close  to  0.5,  and  therefore  close  to  the  probability  that  a  particular  match  is  made.  For  humans 
such  a  display  is  bistable.  The  reason  why  the  result  is  not  exactly  0.5  is  not  a  numerical  artifact  and  is 
explained  in  the  text.) 


the  solution  Vja  %  0.5  is  unstable,  and  any  noise  pushes  the  final  values  to  0  or  1.  This  discrepancy 
between  the  predictions  and  the  psychophysics  is  accounted  by  the  Minimal  Mapping  Theory's 
omission  of  information  about  the  past  motion  of  the  features;  sec  the  Discussion  section  for  more 
details  on  the  limitations  of  the  Minimal  Mapping  Theory.) 

As  pointed  out  in  the  introduction,  Ullman  (1979)  suggested  that  the  main  role  of 
apparent  motion  is  to  serve  as  the  first  stage  in  the  process  of  recovering  the  3-D  structure  of 
objects  from  their  motion.  It  follows,  therefore,  that  the  apparent  motion  mechanism  has  to  cope 
with  perceptual  oddities  due  to  3  D  motion,  particularly  nonrigidity  in  the  image,  and  appearance 
and  disappearance  of  features  due  to  occlusions.  Figure  5  illustrates  how  the  network  deals  with 
these  problems  and  shows  that  its  solutions  are  similar  to  those  of  the  visual  system. 

In  the  figure,  a  3  Dimensional  5  feature  object  is  rotated  by  27°  around  an  axis  which  is 
perpendicular  to  the  viewing  axis,  and  which  belongs  to  the  plane  that  divides  the  head  between  left 
and  riglit.  From  a  bird's  eve  view,  the  features  of  the  object  lie  on  the  corners  of  a  perfect  pentagon 
(Fig.  5a),  and  are  projected  orthographically  into  the  image  plane.  This  projection  is  shown  in 
Figs.  5  b  and  c  under  the  assumption  that  the  object  is  transparent  and  opaque  respectively.  In 
the  opaque  case  it  is  assumed  that  only  the  front  features  can  be  seen  by  the  observer  (see  Fig. 
5a). 

In  the  transparent  case  all  five  features  are  seen,  and  the  relative  distance  between 
features  in  the  image  change,  because  features  in  different  positions  in  the  surface  have  different 
velocities.  Note  in  Fig.  5b  that  this  image  nomigidity  does  not.  disturb  the  ability  of  the  network 
to  solve  tin1  correspondence  problem.  The  convergence  time  for  this  figure  was  0.12r. 

In  the  opaque  case  only  three  of  the  features  are  seen  in  the  first  frame  and  two  in 
the  seiomi.  The  other  features  are  occluded  by  the  surface.  The  main  problem  that  the  network 
hu  es  iii  this  case  is  that  the  first  frame  has  more  features  than  the  second.  Perceptually  this  leads 
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figure  5  Nonrigidity  and  the  appearance  and  disappearance  of  features  a.  A  bird's  eye  view  of  an  object 
rotating  by  27°  around  an  axis  perpendicular  to  the  viewing  axis  and  vertical  in  relation  to  the  head  ol 
the  observer  (shown  schematically  in  the  figure).  The  features  of  the  object  lie  on  the  corners  of  a  perfect 
pentagon  b  The  object  is  assumed  transparent  The  correspondences  are  computed  correctly,  in  spite  of 
the  nonrigidity  of  the  image,  i.e.  features  travel  by  different  amounts  c.  1  he  features  are  assumed  to  lit' 
nil  the  surface  of  an  opaque  cylinder  Note  that  feature  2  appears  in  the  first  frame,  but  not  in  tin  seiotid 
|  lie  solution  of  the  network  matches  that  of  the  human  visual  system,  and  features  ]  and  2  fuse  in  the 
second  frame 


> 

i 


t 


15 

to  fusion,  i.e.  two  features  or  more  from  the  first  frame  match  one  in  tlm  second  (Holers.  1972). 
Hie  network  also  obtained  this  solution  (Fig.  5c,  t  =  O.lr)  when  N  in  Eq.  2.5  was  substituted 
by  A'mox  as  explained  in  Section  2.  Note  also,  that  the  fusion  obtained  by  the  network  had  the 
minimal  mapping  property,  i.e.  features  tended  to  travel  as  little  as  possible.  The  same  strategy 
(i.e.  substituting  N  by  Nmax)  leads  to  splitting,  i.e.  a  feature  in  the  first  frame  matches  two  or 
more  in  the  second,  if  the  number  of  features  in  the  second  frame  is  larger  than  that  of  the  first. 
(This  result  is  again  similar  to  human  perception;  Kolers,  1972.  Fusion  and  splitting,  however, 
have  been  shown  to  disappear  if  the  knowledge  of  occlusion  is  present;  Ramachandran  and  Aristis, 
1983, b.) 

We  show  in  Section  3,  that  for  short  motions,  the  right  parameters  can  be  chosen,  such 
that  the  correct  solution  is  obtained  by  the  network.  This  seems  to  be  the  reason  for  the  success  of 
the  network  in  tho  simulation  of  perceptual  data  (Figs.  1-5).  This  fact  does  not  imply,  however, 
that  the  network  converges  in  general  to  the  global  minimum  of  the  energy  function  given  in  Eq. 
2.5.  In  fact  we  illustrate  in  Figs.  6  and  7  that  for  random  motions  an  incorrect  matching  may  be 
found.  We  also  show,  however,  that  even  if  the  correspondence  is  incorrectly  established,  it  is  near 
optimal. 


For  Fig.  6  a  computational  experiment  with  450  runs  was  done.  For  each  run  the  first 
and  second  frame  consisted  of  two  objects  of  6  features  each,  randomly  placed  in  a  disc  of  radius 
1.  The  correct  match,  i.e.  the  one  that  minimizes  the  total  distance  traveled  by  the  features,  was 
established  by  exhaustive  search.  The  network  was  then  applied  for  the  450  runs  and  the  number  of 
cases  that  fell  in  each  of  the  following  four  categories  was  observed:  1.  correct  answers,  2.  incorrect 
answers  but  one-to-one  matching,  3.  lack  of  one-to-one  matching  but  six  matches,  and  4.  less  than 
six  matches.  The  frequency  histogram  is  shown  in  Fig.  6. 

Note  that  a  one  -to-one  mapping  was  always  established  (and  consequently  the  number 
of  matches  was  always  six).  In  this  experiment,  however,  only  58.4%  of  the  solutions  computed  by 
the  network  corresponded  to  minimal  mapping. 

In  the  other  41.6%  of  the  cases,  an  incorrect  answer  was  found.  These  incorrect  solu¬ 
tions,  however,  were  near  optimal  as  seen  in  Fig.  7.  Four  motions  for  which  a  incorrect  mapping 
was  established  are  displayed  in  Figs.  7  a-d.  In  these  figures  the  correct  matches,  as  found  by 
exhaustive  search,  are  marked  by  the  dotted  lines,  and  the  predictions  of  the  network  are  marked 
by  the  solid  lines.  Note  that  the  solutions  found  by  the  network  were  almost  identical  to  the  optimal 
ones,  and  the  errors  were  each  time  the  switching  of  only  one  pair  of  correspondences. 

The  histograms  in  Figs.  7,  e-h,  correspond  to  Figs.  7,  a-d,  respectively.  They  plot 
tho  distribution  of  the  total  distance  traveled  by  the  features,  for  the  6!  =  720  possible  cases  of 
one  to  one  matching.  The  arrows  in  these  histograms  show  the  total  distance  traveled  for  the 
answer  given  by  the  network.  Note  that  as  predicted  by  Figs.  7  a  d,  the  network  results  fell  in 
near  optimal  positions,  i.e.  many  standard  deviations  away  from  the  mean  of  the  distribution. 

Another  fact  of  interest  related  to  the  experiment  in  Fig.  6,  and  which  may  provide  a 
psychophysicallv  testable  prediction  for  such  types  of  networks,  is  that  the  time  of  convergence  is 
much  longer  on  average  for  incorrect  matches  than  it  is  for  correct  ones.  In  fact,  for  the  last  150 
runs  of  the  experiment  in  f  ig.  6,  the  mean  time  of  convergence  for  cases  where  correct  matches 
were  predicted  was  0.1  Or  ±().()()lr  (standard  error),  and  the  mean  time  for  the  incorrect  cases  was 
0.3(i(ir  t  O.OI3r.  Errors  are  due  to  a  conflict  between  the  necessity  for  minimi/at  ion  of  the  total 
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Figure  6.  A  frequency  histogram  for  correct  vs.  incorrect  matchings.  For  450  runs  the  first  and  the  second 
frames  consisted  of  two  random  objects  of  six  features  each  (the  features  were  randomly  placed  on  a  disc 
of  radius  1).  The  first  column  corresponds  to  the  cases  where  a  true  minimal  mapping  was  found  by  the 
network,  i.e.  the  sum  of  the  distances  traveled  by  the  features  is  minimal  as  verified  by  an  exhaustive 
search.  The  second  column  corresponds  to  the  cases  where  the  minimal  mapping  was  not  found  by  the 
network,  but  a  one-to-one  matching  was  still  made.  There  was  not  any  case-  where  a  one  to  one  match 
failed  to  appear  (third  and  fourth  columns  of  the  histogram).  Thus,  the  correct  solution  is  not  always 
obtained 


distance  traveled  and  the  necessity  for  one  to  one  matching.  These  conflicts  often  cause  a  delay  in 
the  decision  process  of  the  network.  In  Fig.  8  we  illustrate  this  fact  for  the  paradigm  of  Fig.  7  d. 
Similarly  to  Fig.  2,  we  show  the*  temporal  evolution  for  the  [ \ a ]  array. 

Note  that  at  /  —  O.OGr,  the  values  of  l ]yJ  and  V'3;,  begin  to  rise,  mainly  d liven  by  the 
proximity  of  feature  3  in  the  first  frame  to  features  2  and  3  in  the  second  frame  (see  I  ig.  7d). 
(liven  the  imposition  of  one  to  one  matches,  this  leads  to  a  slow  competition  between  l  and  F,t 
( /  =  0. 12r, 0.24r ).  In  the  meantime  the  values  of  l  _.| ,  l  ,  l }, and  \  t(.(  raised  and  converged  to  I  at 
about  /  -  0.24r.  From  the  exhaustive  search  we  found  that  t lie  optimal  solution  implied  1 I 


Finnic  7.  Near  optimal  computations  by  the  network,  a  <1.  The  four  cases  were  taken  from  the  experiment 
done  in  Fig.  (i,  and  show  examples  where  minimal  mapping  was  not  found  by  the  network.  The  symbols 
are  similar  to  those  of  Fig.  I  The  dotted  lines  represent  the  correct  minimal  mapping  as  found  by  an 
exhaustive  search.  The  mistakes  made  by  the  network  were  always  the  switching  of  only  one  pair  of 
correspondences,  e  h  correspond  to  a  d  respectively.  These  histograms  show  the  distribution  of  the  total 
d.-iame  traveled  by  the  features  for  all  of  the  possible  cases  of  one  to  one  mapping.  The  abscissa  has 
arbitrary  scale  (but  equal  in  all  histograms).  The  histograms  have  the  same  area;  (i!  =  720  matching  cases. 

1  lie  arrows  indicate  the  total  distance  traveled  for  the  solution  obtained  by  the  network  (in  figures  f  and  g 
this  \ alue  was  contained  by  the  left  most  bin  of  the  histogram).  In  tin-  cases  where  errors  were  made,  the 
solution  was  nevertheless  near  optimal. 
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Figure  8.  How  errors  are  made  by  the  network.  The  figure  shows  the  time  evolution  of  the  correspondence 
array  for  the  example  shown  in  Fig.  7  d.  For  an  explanation  of  the  details  see  Fig.  2.  The  mistake  is  made 
because  of  the  conflict  between  minimal  mapping  and  one  to- one  matching.  From  a  minimal  mapping 
point  of  view,  the  matches  V32  and  V33  would  be  preferred.  This,  however,  goes  against  the  one  to  one 
matching  requirement.  While  V32  and  V33  compete,  other  matches,  which  are  not  necessarily  correct  from 
a  minimal  mapping  point  of  view,  develop. 


This  was  an  impossible  solution  for  the  network  after  /  =  ().‘21r,  because  lB4  s:  1.  It  followed  that 
the  network  could  not  reach  an  optimal  solution  anymore  and  had  settled  to  a  nearly  optimal  one. 
in  which  V32  ~  0  and  Vrj 2  ~  V33  as  1.  The  long  time  of  convergence  was  due  to  the  inability  of  l  ,,_. 
to  rise  due  to  the  imposition  of  one-to-one  matching  and  to  the  weak  capacity  of  the  network  to 
increase  V 12  because  of  the  large  distance  between  feature  1  in  the  first  frame  and  feature  2  in  the 
second . 

The  main  reason  for  building  an  implementation  of  the  Minimal  Mapping  Theory  in 
terms  of  “neural  networks”  is  to  obtain  a  fast  convergence1  lo  the  solution.  This  was  the  case  for 
the  examples  showed  so  far,  in  wdiich  the  convergence  happened  in  a  fraction  of  the  time  constant 
of  the  elementary  units  of  the  network.  We  now  bring  evidence  that  this  fastness  persists  c\eu 
when  the  number  of  features  in  motion  increases.  In  order  to  demonstrate  this  we  performed  an 
experiment  whose  results  are  plotted  in  the  graph  of  Fig.  !).  For  each  cut  rv  in  the  graph  a  few  run- 
were  performed.  Fach  run  consisted  of  an  object  of  a  given  number  of  fen  lures  ( l..-,cissa  )  randotnh 
placed  on  a  disc  of  radius  I.  The  object  was  identical  in  the  lirst  and  second  frames  to  eunrantc.- 
that  a  correct  solution  would  be  obtained.  The  average  time  of  convergence  arid  the  standard  cp,, 


where  F  and  7  are  positive  constants.  For  comparison  the  thin  solid  line  shows  a  linear  dependent .  . 
(adjusted  to  be  equal  to  the  experiment  for  the  two  features  rase),  he.  -  1.  Note  t hat  1  he 
dependence  of  the  solution  obtained  by  the  network  is  sublinear.  In  fact  its  power  was  about 
7  =  0.52.  (This  means  that  from  the  point  of  view  of  discrete  opiimi/ation  the  network  method 
has  a  complexity  of  about  0(n}^).)  One  sees,  therefore,  t  hat  1  he  convergence  t into  of  the  "neural 
network’'  scales  weakly  (square  root)  with  the  number  of  features  in  motion. 

The  strength  of  this  result  is  emphasized  if  one  considers  good  serial  algorithms  to  -nlvo 
the  same  problem.  Mathematically,  minimal  mapping  is  a  discrete  optimization  problem  known 
as  the  linear  assignment  problem  (Burkard,  1979).  Some  of  the  best  serial  algorithms  proposed 
to  solve  this  problem  scaled  with  the  third  power  of  the  number  of  features.  (I)iriic  and  Kronrad. 
1969:  Tomizawa,  1971),  i.e.  7  =  3.  (Once  again,  this  implies  that  from  the  point  of  view  of  discrete 
optimization  these  methods  have  a  complexity  of  about  0(n3).)  The  relatively  strong  dependence 
of  the  serial  methods  are  illustrated  by  the  dasher!  line  of  Fig.  9.  Note  the  much  steeper  slope  of 
the  serial  algorithms,  compared  to  the  network  implementation.  (There  are  not  at  the  present  time, 
as  far  as  we  know,  studies  of  the  complexities  of  other  parallel  solutions  for  the  correspondence  or 
related  problems.  Therefore  a  comparison  between  our  network  with  other  parallel  methods  was 
not  possible.) 

In  conclusion  we  have  shown  evidence  that  the  convergence  time  of  the  “neural 
network”  implementation  of  the  Minimal  Mapping  Theory  scales  weakly  with  the  number  of  features 
in  motion,  and  therefore,  remains  short  even  for  cases  with  a  large  number  of  features.  This  is  due 
to  the  massive  nature  of  the  connectivity  of  the  network,  which  allows  information  to  travel  at  high 
rates  from  unit  to  unit  in  the  network. 

In  the  next  section  we  prove  ’lieoretical  results  related  to  the  quality  ol  convergence  of 
the  neural  network”  implementation  of  the  Minimal  Mapping  Theory. 

3  Theoretical  results 

llopfield  and  Tank  (1985)  demonstrated  good  solutions  to  the  Traveling  Salesman  Problem  for  up 
to  thirty  cities.  It  seems  that  for  a  larger  number  of  cities  the  solutions  become  less  good  ( llopfield. 
pers.  comm.).  We  have  reasons  to  believe  that  the  network  reported  in  this  paper  behaves  similarlv. 
Our  problem,  however,  is  different  in  an  important  aspect.  The  size  of  the  d,„'s  depend  on  the 
lime  between  matched  image  frames.  We  prove  this  theorem:  provided  that  the  extent  of  motion 
is  sufficiently  small  the  network  will  always  obtain  the  correct  match.  Therefore,  an  increase  in  the 
number  of  features  to  be  matched  can  be  compensated  for  by  reducing  the  time  between  frames. 

In  order  to  show  this  result  we  prove  that  if  the  diagonal  terms  of  the  (r/t  lj  matrix  un¬ 
sufficiently  small  compared  to  the  off  diagonal  terms,  then  one  we  can  choose  the  parameters  of 
the  system  such  that  it  will  always  converge  to  the  correct  solution.  At  the  .-n  1  of  the  se<  linn. 
will  use  this  and  other  results  to  explain  how  choices  of  parameters  were  made  in  this  work. 

We  will  first  show,  however,  that  the  strength  of  matches.  l  j.,.  are  never  exactlv  0  01 
I.  but  can  only  approach  these  values  arbitrarily  closely.  In  the  proof  for  this  claim  we  will 
provide  a  derivation  of  an  analytic  expression  for  the  equilibrium  solutions  of  th«  network. 
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As  shown  in  Section  2,  Eq.  ‘2.5  is  a  Liapunov  function  for  tlm  system.  Therefore  the 
solutions  of  the  system  are  asymptotically  stable.  It  follows  that  at  equilibrium  dU,a/dt  =  0  or 
from  Eq.  2.9: 


U,a  =  r  {B  (N  -  V')  -  Cdia  -  A  {VfOL  +  VaROW  -  2Vta))  ,  (3.1) 

which  is  an  analytic  expression  for  the  equilibrium  solution  of  the  system.  The  values  of  l',a  are 
bounded;  0  <  \\a  <  1  (Eq.  2.3).  It  follows  that  the  right  wing  of  Eq.  3.1  is  bounded  from  below 
and  above.  Indeed: 


r  (B  (A'  -  A’-’)  -  Cd,a  -  2  AM)  <  Uia  <  t(BN  -  Cdia).  (3.2) 

This  proves  that  at  equilibrium,  0  <  EIa  <  1,  because  by  Eq.  2.3,  V,a  — *  1  (0)  if  and  only  if 
Via  —  +CC-  (-00). 

The  values  of  V{a  are  different  than  0  and  1  not  only  for  equilibrium.  Indeed,  differen¬ 
tiating  Eq.2.4  and  substituting  in  Eq.  2.6  yields: 


dVia 


-2XVia(l  -  Via) 


dE 


(3.3) 


dt  ,a'dVta 

It  follows  that  if  at  0  <  t1  <  oo,  lrin  =  1  (0),  then  dVta/dt  =  0.  (One  can  show  that  dE/0\',,1  is 
always  finite.)  Therefore,  if  at  a  given  instant,  V,a  =  1(0),  then  it  remains  there  forever. 

Let  us  now  state  the  main  result  of  this  section. 


THEOREM:  For  given  A  and  N  >  2,  if  da  <  djb,  1  <  t,  j, b  <  N,  j  ^  b ,  then 
for  any  1  >  c  >  0,  there  are  B0  >  0  and  C0  >  0,  such  that  if  B  >  Bo  and  C  >  C0,  it 
follows  that,  at  equilibrium  1  —  V,,  <  r  and  Vjf,  <  c. 


In  the  process  of  proving  this  theorem  we  will  provide  bounds  for  Bo  and  G>  in  terms  of  .1.  the 
data  parameters  and  t.  We  begin  our  proof  with  three  short  lemmas. 


LEMMA  1:  At  equilibrium  Ar  >  V’. 

I 'roof: 

Consider  the  update  Eq.  2.9.  T  his  ran  he  written  as 


-  (Uiae'")  =  ef'r  (-A  (VfOL  +  yROW  _  2V.a) 

a!  (3.  1 ) 

If  TV  -  v  -  0,  then  Uia  exp (t/r)  decreases,  because  the  sum  of  the  terms  on  the  right-hand  side  of 
3.4  is  negative.  This  implies  that  Via  and  consequently  V,a  and  V'  decrease.  The  assertion  of  the 
lemma  then  follows  from  the  fact  that  at  t  =  0,  V  =  N  (see  initial  conditions  in  Kq.  2.1 1 ). 


Proof: 


LEMMA  2:  For  given  A,  if  dit  <  djb,  1  <  i,j.b  <  N,  j  £  b.  then  for  any  a  >  0. 
there  is  C0  >  0,  such  that  if  C  >  C0,  it  follows  that  at  equilibrium  (’„  -  U  b  >  nr. 


From  Eq.  3.1  one  obtains  that  at  equilibrium: 


(Un-Vjb)  = 

_  A  (VfOL  +  yHOW  _  yCOL  _  yROW 

~2V,i  +  +  C  ( djt,  —  da)  ^ 

>  -  NA  +  Cd\ 

where  d"  =  min ,,J9tb  (djb~d„).  This  inequality  holds  because  by  Lemma  1,|  VbOL  +  yR^w  _  • 

A.  Let  C  >  Co  =  (a  +  (N  A))/d" ,  then: 


(Uit  -  U j b)  >  or. 


LEMMA  3:  for  given  /t,C  and  N  >  2,  and  for  any  (  >  0,  there  is  /!„  >  0  smh  that 
if  II  >  II o  then  at  equilibrium  N  -  V  <  (. 


Proof: 


from  Eq.  3.1  and  by  Lemma  I,  the  following  inequality  ran  be  written  at  equilibri 


-f-  >  -AN  +  H(N  -  V )  -  ('<r\  (.<  7, 

where  ,/*•  -  max,,,  Let  II  >  [)0  =  (AN  +  Cd‘m)/< .  Then  N  -  l  <  This  is  because,  if,,,, 

the  contrary  N  -  \  >  e,  it  follows: 


I  rom  Kq.  2.."}  this  implies  that  L,a  >  1/2  or: 


V  >  —  >  A\ 

~  2  ~ 

which  is  in  contradiction  to  Lemma  l  and  implies  N  —  V  <  e. 


(3.9) 


We  now  proceed  with  the  proof  of  the  theorem. 


Proof  of  the  Theorem: 

Let 


and 


B  >  B0  = 


2  (AN +  Cd”) 
c 


(3.10) 


r  >  Co  = 


log  (((2iV  -  c)  (2  ( N 2  -  N)  -())  /c2)  +  2A tAN 


2Ard* 


(3-ii : 


We  want  to  prove  that  Vjb  <  <  and  V),  >  1  —  c.  In  the  first  case  we  will  prove  a  stronger  result, 
namely  Vjb  <  i/(2(N2  -  Ar ) ) .  Suppose  on  the  contrary  that  Vjb  >  e / ( 2 ( A' 2  -  A’)).  In  that  case 


E  ^  a  j' 


(3.12) 


and  from  Kq.  2.1: 


1 


Vjb  >  -r  log 


2 A  2(N2  -  A )  -  c 
From  the  proof  of  Lemma  2  and  Condition  3.11,  one  obtains: 


(3.13) 


lfi,  -  Vjh  >  —  log 


1,  f(2N -c){2{N2  -  N)  -<) 


2A 


(3.1 


Combining  Kqs.  3.13  and  3.1  1  and  substituting  the  result  into  Kq.  2.3  one  obtains: 


2-1 


Ev‘->N~‘r 


(a.  Hi) 


However, 


V  =  Y,V"  +  VJ>-  (3.17) 

l 

Thus,  from  Eqs.  3.12  and  3.16  one  obtains  that  \'  >  ,V,  which  is  a  contradiction  to  Lemma  1.  This 
implies  Vjb  <  c/(2(Ar2  -  N ))  <  r. 

Let  us  now  prove  that  T„  >  1  -  <.  Because  Vjk  <  c/(2(Ar*'  -  A')),  we  obtain 


(3.1*) 


Also,  from  Condition  3.10  and  the  proof  of  Lemma  3. 


V  >  iV  “  2' 


From  Eqs.  3.17,  3.18  and  3.19  one  obtains: 


But  \\k  <  1,  thus: 


which  is  the  desired  result. 


y*.  \'kk  >  A  -  ( . 


v„  >  iV  - 1  -  y'  vkk  >  i  ~  c, 


(3.20) 


We  have  shown,  therefore,  that  the  network  is  capable  of  exactly  solving  the  correspondence  problem 
for  motions  smaller  than  the  internal  distances  of  the  object.  Lids  is  particularly  important  for 
non  dense  objects,  i.e.  those  containing  small  to  medium  numbers  of  features  (e  g.  Fig.  1).  Our 
computer  simulations  confirm  this  result,  and  indicate  'hat  for  such  objects.  .1  near  optimal  match 
is  obtained  for  complex  large  motions  (Fig.  7). 

The  development  of  the  theorem,  and  other  results,  suggest  rules  of  thumb  for  the 
«  hoice  of  the  network's  parameters.  Consider  the  energy  function  in  Fq.  2.V  For  given  A  .  ., 
proportional  change  of  parameters  A.H.C  and  I/A  will  only  scale  the  shape  of  I  .  and  thus,  will 
not  change  the  equilibrium  solutions  of  the  system.  Also,  the  dynamics  of  <  on vergeneo  will  not 
he  changed,  because  a  modulation  of  these  parameters  will  cause  an  inversely  proportional  change 
in  A.  leaving  the  equation  of  motion  unmodified.  (To  understand  this  claim  more  easilv  >. <•  the 
equation  of  motion  in  the  form  expressed  in  Eq.  3.3.)  It  follows,  contrarv  to  w  h.V  was  (  oncbul.  .1 


1 » v  llnpfield  (  I9xn.  that  the  absolute  value  of  tli<>  parameter  A  is  inelevant;  only  it'-  relative  value 
to  the  other  parameters  matters.  In  all  of  our  simulations  and  in  the  rest  of  this  discussion.  A  was 
set  to  1 . 

A  few  extra  rules  of  thumb  can  also  be  derived  front  our  results.  Equation  3.10  suggests 
that  the  parameter  B  has  to  be  high  compared  to  AN  and  Cd*.  The  equation  gives  formulas  for 
how  largo  B  should  be  in  terms  of  the  precision  required  in  the  problem  (c).  Equation  3.1 1  suggests 
that  C'd *  should  bo  relatively  high  compared  to  AN  for  short  motions.  Simulations  showed  that 
l  A  should  be  high  if  the  system  has  to  solve  ambiguous  situations  in  which  multiple  matches  to  a 
given  feature  are  possible  (Eig.  1). 

4  The  Structural  Theory  for  Apparent  Motion 

In  tills  work  so  far,  we  developed  and  analyzed  a  “neural- network"  implementation  of  the  Minima! 
Mapping  Theory.  The  justification  for  the  Minimal  Mapping  Theory  is  based  on  Ullman's  argument 
i  1079)  that  the  structure  from  motion  process  is  divided  in  two  stages;  first  solving  the  correspon¬ 
dence  problem,  then  using  the  correspondence  information  to  recover  the  3  D  shape  of  objects.  In 
this  section  tin1  same  mathematical  formalism  of  the  preceding  sections  is  used,  i.e.  that  of  the 
•‘neural  networks",  to  bring  some  support  to  Ellman's  two-stage  hypothesis.  We  study  whether 
rigidity  alone  (the  basic  assumption  used  to  recover  the  3  D  structure  from  motion)  is  sufficient 
to  solve  the  correspondence  problem  (and  simultaneously  the  structure  from  motion  problem).  We 
assume  rigidity  in  the  form  used  by  Ullman  (19X1).  We  call  the  theory  be.-.  J  on  rigidity  alone  the 
Si rurtunil  Tlnon /  for  apparent  motion.  It.  is  shown  that  further  constraints  a'o  usually  needed  to 
Iwlp  this  theory  obtain  correct  answers. 

1.1  A  Network  Implementation 

In  this  section  we  do  not  use  the  assumption  of  strict  rigidity,  but  rather  Ullman’s  incremental 
rigidity  scheme,  which  allows  for  nonrigid  motions  (l  liman.  198-1;  Grzywacz  and  Hildreth,  1986. 
19s7:  (Irzywac/,  <-t  al..  1987;  Hildreth  et  al.  19X7).  In  the  incremental  rigidity  scheme  an  object 

with  .V  features  is  described  by  a  model  (j*,(  /  ).  yt(  t),  zt(  t)).  for  i  =  1 . A.  Thor, 3/  components  are 

directly  observable  (assuming  orthographic  projection)  and  the  z  components  are  to  be  deduced. 
At  I  —  0  the  z  components  are  set  to  zero.  Then,  at  each  instant,  one  uses  the  previous  values  of 
tie-  :  "s.  —  H)  to  calculate  the  new  ones.  zf  =  ',{!)-  Ibis  calculation  minimizes  deviations  of 

the  objeet's  riuidi'v.  AIL  between  flames.  AH  may  be  defined  as  follows,  f  irst  define  TtJ{f)  by 

/,,,(()  =,  (.,.(/)  -  j- ,(/))-*  +  t  //,(/)  -  Uj(t))2  +(:,(/)  -  ;,(/))-.  (1.1) 


AH  ]T(/mj(M  L“U  ~  H)]1  ‘  (,’“M 

*•./ 

1  ae  M|i|,  1 11 1. 1 1  I  lien,  v  proposes  to  sol  vc  simultaneously  the  correspondence  and  the  st  rue  I  lire  from 
iimt  mu  problems.  I  Ids  i-.  to  be  dune  by  finding  the  correspondences,  which  upon  appli,  .it  ion  of  the 


incremental  rigidity  scheme,  yield  the  minimal  A R.  We  now  use  the  set  of  binary  correspondent  e 
variables  Vrta  to  define  anew  matching  cost  function  Er,  whose  minimization  is  equivalent  to  that 
proposed  by  the  Structural  Theory: 


N 

En=  Y,  (Lab(t)~  Lij(t-St))2V,aVjb.  (4.3t 

itj,a,b 

To  find  the  correspondence  and  structure  simultaneously  by  using  incremental  rigidity,  we  minimize 
Er  with  respect  to  z[  and  V{a,  requiring  that  all  features  in  the  first  frame  are  matched  to  exactly 
one  feature  in  the  second.  The  method  is  similar  to  the  one  described  for  the  Minimal  Mapping 
Theory.  It  begins  by  substituting  the  Emm  term  of  Eq.  2.5  by  Er  of  Hq.  4.3.  It  proceeds  by- 
updating  the  Via  variables  (see  definition  in  Eq.  2.4)  by  using  simultaneously  the  equations  <d 
motion  2.6  and 


dA 

dt 


1  <  i  <  N. 


(  1.  I; 


where  /J  is  a  positive  parameter  of  the  problem.  As  in  the  case  of  the  Minimal  Mapping  Theory. 
E  is  a  Liapunov  function  of  the  system.  This  is  because  for  the  Structural  Theory  Eq.  2.7  can  be 
rewritten  as: 


dE  _  y,  dvia  dE  - 
dt  4^ oujdv J 

i  a 


(1.5) 


which  together  with  Eq.  2.8  proves  that  dE / dt  <  0.  It  follows  that  also  for  the  Structural  Theory 
the  system  will  stop  in  a  point  of  the  solution  space  in  which  the  function  E  is  at  one  of  its  minima. 

The  next  section  illustrates  the  results  of  our  simulations  with  the  equations  of  motion 
2.6  and  4.4.  and  compares  the  results  to  those  obtained  for  the  Minimal  Mapping  T  heory.  It  also 
discusses  a  theory  which  is  a  hybrid  between  th  Structural  and  the  Minimal  Mapping  theories, 
and  which  seems  to  give  rise  to  better  behaviors  than  any  of  the  isolated  theories. 


4.2  Comparison  with  the  Minimal  Mapping  Theory 

Despite  extensive  experimentation  with  the  parameters,  the  system  based  on  the  Structural  Theorv 
rarely  converged  to  the  correct  answer,  unless  given  a  hint  of  the  correct  matches.  'I  he  system  made, 
however,  some  interesting  mistakes.  It  would  sometimes  choose  matches  and  depth  values  for  De¬ 
features.  in  such  a  way  that  the  model  of  the  object  for  the  second  frame  had  almost  the  same 
3  i)  structure  as  the  model  for  the  first  frame,  but  such  that  the  motion  between  frames  wa- 
com  plica!  <*<1.  We  illustrate  this  phenomenon  in  Fig.  10. 

In  the  example  shown  in  this  figure,  a  throe  feature  object  was  rotated  around  an  axi- 
perpendicular  to  the  .r  -  z  plane  by  .T()°.  (It  can  be  shown  that  in  this  rase,  if  we  use  a  matching, 
cost  function  of  the  form  expressed  in  Eq.  4.3.,  the  y  coordinates  of  the  features  are  irrelevant  to 
the  problem.)  When  observed  from  a  bird’s  eye  view  the  object  looked  like  a  rectangular  triangle 
of  sides  3.4  and  5  (solid  straight  lines  of  Fig.  10a).  T  he  .r  coordinates  of  the  three  leatures  in  the 
first  Ira  me  where  0.0  and  4  for  features  A.li  and  ('  respectively.  The  c  coordinate*,  for  the  same 


Figure  10  The  errors  of  the  Structural  Theory  The  solid  triangles  are  the  bird’s  eye  views  of  the  moving 
object,  a.  shows  the  first  frame  and  b.  the  second.  The  dashed  triangle  in  a.  is  the  triangle  computed 
by  the  network  implementation  of  the  Structural  Theory.  The  image  coordinates  of  A’.  B'  and  O’  are  the 
same  as  the  image  coordinates  in  the  second  frame  of  A,  B  and  C,  respectively.  The  curved  arrows  show’ 
the  computed  correspondences.  The  computed  structure  and  correspondences  were  incorrect.  However,  if 
the  computed  structure  is  superimposed  on  the  true  structure,  while  forcing  their  corresponding  corners  to 
be  close,  then  they  are  shown  to  be  similar  (Fig.  b).  Thus,  such  a  theory  may  be  able  to  compute  a  rough 
estimate  of  the  structure  of  an  object,  w'ithout  having  to  solve  the  correspondence  problem. 

features  were  0,3  and  3.  The  rotation  was  anticlockwise  (with  feature  A  fixed),  when  observed 
from  the  bird's  eye  view.  The  solid  lines  of  Fig.  10  b  show  the  position  of  the  object  in  the  second 
frame  from  this  view.  The  values  of  x  were  directly  measurable  by  the  observer.  We  assumed  that 
the  observer  knew  the  values  of  z  in  the  first  frame.  The  values  of  z  for  the  second  frame  and 
tlm  values  of  the  lj(I's  were  calculated  by  integrating  the  equations  of  motion  2.6  and  4.4.  (The 
parameters  used  in  this  display  were  A  =  6000,  B  =  10000,  C  =  10,  r  =  1,A  =  50  and  fi  =  10.  The 
initial  values  of  the  z  coordinates  in  the  second  frame  were  close  to  zero,  but  randomly  chosen.  In 
this  example  those  coordinates  were  0.01,-0.01  and  0.005  for  features  A ,  B  and  (\  respectively.) 

A  bird’s  eye  view  of  the  solution  is  shown  in  the  dotted  lines  of  Fig.  10a.  The  curved 
arrows  indicate  the  motions  observed  (as  shown  by  the  correspondence  variables,  lj„).  Note  that 
these  motions  were  incorrect  and  very  complicated.  The  3-D  .structure  of  the  new  triangle,  however, 
was  not  very  different  from  the  original  one.  The  dotted  lines  of  Fig.  10  b  represent  the  dotted 
triangle  of  Fig.  10  a,  but  with  the  sides  rotated  and  “mirror  imaged''.  These  transformations  were 
done  in  such  a  way  that  the  matched  corners  in  the  two  frames  were  now  close  in  space.  Note  the 
similarity  of  structures  between  the  original  and  computed  triangles.  This  indicates  that  although 
the  “neural  network’’  implementation  of  the  Structural  Theory  is  unable  to  compute  the  matches 


correctly,  it  may  be  used  in  some  situations  to  bypass  the  correspondence  problem  altogether,  and 
make  a  fast  (but  rough)  estimation  of  the  parameters  of  ,1  1)  structure  of  the  object. 

The  failure  of  this  system  to  obtain  the  correct  correspondences  doe,  not  imply  1 1 . • 
the  Structural  Theory  would  fail  for  any  implementation.  On  the  contrary,  for  most  rigid  motion-, 
an  exhaustive  search  based  on  the  Structural  Theory  would  give  the  correct  answer.  This  is  because 
the  right  correspondences  and  structure  of  the  object  is  often  the  only  situation  where  the  energv 
function  is  exactly  0.  The  above  failures,  however,  are  to  be  taken  as  a  serious  handicap  of  the 
Structural  Theory.  It  shows  that  the  solution  space  explored  by  this  ilmo;;,  complex,  i.e.  j;  ha- 
many  local  minima.  This  argument  shows  that  only  very  elaborate,  and  I  herefoje,  -low  met  hods  r an 
find  the  global  minimum.  The  Minimal  Mapping  Theory,  on  the  other  hand,  would  only  yi<  Id  the 
correct  matches  for  translations  or  relatively  short  rotations,  independently  of  the  implementation. 
As  we  have  shown,  however,  for  the  Minimal  Mapping  Theory  a  very  fast  “neural  network"  imph' 
mentation  is  always  possible.  The  evidence  that  apparent  motion  in  humans  is  mainly  based  on 
minimal  mapping,  therefore,  seems  to  point  out,  that  their  solution  of  the  motion  eorrospoudeuri 
problem  gives  up  precision  under  all  circumstances  in  favor  of  speed. 

We  call  the  attention  to  the  fact  t ha  1  the  complexity  of  the  solution  space  in  the 
Structural  Theory  is  not  due  to  the  tise  of  two  equations  of  motion.  Kqs.  2.<>  ami  1.1.  instead  of 
only  one  used  by  t  he  Minimal  Mapping  Theory.  This  complexity  is  1mm- a  use  of  the  more  complicated 
dependence  of  f-.'n  on  the  correspondence  variables.  lj„.  than  of  Estst  (compare  Tqs.  2.1  and 
In  fact.  ( ;r/ywa<  /  (  H)K(>)  has  demonstrated  that  problems  similar  to  those  ill,  t  oiled  in  Fig.  10  still 
exist  in  ,i  -j  1 1  version  of  the  Structural  Theory.  In  this  version  a  search  for  depth  values  (equation 
ol  motion  1.1)  is  not  necessary. 

lie-ides  being  able  to  bypass  the  correspondence  problem  under  some  circumstances 

■  I  il’.  Iidu,  the  Structural  Theory  may  also  turn  out  to  be  useful  in  cases  for  which  minimal  mapping 
fails  Such  -it  nations  may  include  large  rotations  and  motion  of  features  past  occluding  boundaries 

■  if  an  ob  |e<  t  \\V  found  in  out  si  mula  t  ions  that  a  !  lieory  that  is  ;i  hybrid  bet  ween  the  St  r  t  ura!  and 

■  ir  M  i  ii,  ma  I  flapping  theories  can  often  handle  t  hese  sit  iiat  ions.  ()u  r  implement  at  ion  ol  t  ' .  hy  brid 

theory  was  done  by  including  both  the  and  the  terms  in  I  lie  energy  function  i  •  q.  2 .  Vi . 

1  hi-  livbrid  theory  [Moved  to  lie  the  best  of  both  wot  Ids.  beinr>  able  to  compute  sinii,ii,ineoii.-l\ 
and  'orre.  tl\  tin-  correspondences  of  the  features  in  motion  and  their  depth  We  conclude  that 
although  the  rigsditv  assumption  used  by  the  Structural  Theory  has  serious  drawbacks  when  used 
a  i*  'Me  to  -o|ve  the  i  oi  res  pondence  problem,  it  can  significant  !\  In  .vhen  u-ed  in  conjunction  with 
’In-  mi  ni  ma  I  mapping  ;  sum pt ion. 

•  i  )isc  ussion 

I  I  - ,  -  paper  has  de-i  lilted  met  hods  of  implementing  t  henries  of  motion  cm  o--  pomience  using  ina- 
o  e| .  parallel  networks.  Our  emphasis  has  been  on  networks  that  are  I.,  I  ami  w  hn  h  obtains  t  lie 
coiieif  re-uit  mo  t  of  the  time  rather  than  on  network-  that  are  infallible  but  -low.  W  e  .showed 
low  to  design  a  network  implementing  Oilman's  theory  ol  minimal  mapping  and  demonstrated  its 
e||ei  '  l\ '-lie  -  We  proved  SOIlie  convergence  results  for  this  netwolk.  Ne\t  we  questioned  whether 
I-.’  i -.  ie  w.t.s  sufficient  to  determine  correspondence  and  tested  a  'hemy  ba-ed  on  t  hi-  as-iini|i 
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tinu  This  theory  behaved  poorly  1ml  a  hybrid  version  incorporating  some  elements  of  tin-  Minimal 
Mapping  Theory  worked  well. 

An  aim  of  our  work  was  to  see  if  rigidity  alone  was  sufficient  to  solve  the  correspondence 
problem.  There  are  a  number  of  ways  that  rigidity  could  be  used  and  it  is  infeasible  to  test  all  of 
them.  Instead  we  foment  rated  on  a  method  based  on  the  incremental  rigidity  scheme  ( (liman. 
1981),  and  conjectured  that  other  schemes  would  give  similar  results.  Our  results  suggest,  that 
rigidity  alone  is  unable  to  solve  the  correspondence  problem,  but  there  are  two  reservations.  Firstly 
it  is  possible  that  other  methods  of  using  rigidity  may  give  better  results.  Secondly  it  is  possible 
that  the  fault  lay  in  the  use  of  our  choice  of  network  and  that  other  implementations  would  succeed. 
To  check  this  second  possibility  we  designed  a  scheme  based  on  simulated  annealing  (Kirkpatrick 
et  ah.  1983).  Trial  runs  indicated  that  the  convergence  of  the  Structural  Theory  did  not  improve. 
The  energy  function  seems  to  have  a  number  of  minima  of  similar  depth  and  so  no  method,  even 
simulated  annealing,  will  succeed  in  a  reasonable  time. 

There  are  some  simple  psychophysical  experiments  that  could  be  done  to  see  if  rigidity 
is  used  for  correspondence.  Consider  a  triangle  in  space  lying  in  a  plane  along  the  line  of  sight  of  the 
viewer  so  that  the  projections  of  the  three  vertices  onto  the  image  plant-  lie  in  a  straight  line.  As  the 
triangle  is  rotated  *he  order  of  vertices  in  the  projection  will  reverse.  In  these  situations  minimal 
mapping  will  give  the  wrong  answer.  The  modified  version  of  the  Structural  Theory  (including 
minimal  mapping  terms)  will  give  the  correct  answer.  Informal  psychophysics  suggests  that  human 
perception  may  be  wrong  in  this  case,  but  the  results  are  not  conclusive. 

We  were  able  to  prove  that  our  minimal  mapping  network  converged  to  the  right  answer 
only  if  the  displacement  of  the  features  between  frames  was  smaller  than  the  average  distance 
between  features.  There  are  probably  few  situations  for  which  minimal  mapping  wculd  give  the 
correct  answer  if  the  displacement  of  features  is  larger  than  the  average  distance  between  them.  It 
would  be  interesting  to  devise  examples  of  these  situations  and  do  psychophysics  experiments. 

Minimal  mapping  is  an  elegant  t  heory  that  gives  a  good  description  of  a  range  of  physi¬ 
cal  phenomena.  Recently,  however,  two  psychophysical  effects  have  been  discovered  that  the  theory 
cannot  account  for  without  modifications.  The  first  is  motion  inertia  ( Ramachand ran  and  Anstis 
1983.19X7;  F.ggleston.  1984;  Clrzywacz,  1987).  This  shows  that  the  matching  of  features  between 
two  frames  is  influenced  by  their  matching  in  previous  frames;  features  have  inertia  and  tend  to 
prefer  matches  in  the  directions  in  which  they  have  been  moving.  In  contrast  the  Motion  captun 
effect s  can  be  dramatically  illustrated  by  Ramar  handrail's  moving  leopard  analogy.  If  the  boundary 
of  the  leopard  is  invisible  then  the  spots  on  the  leopard  are  matched  to  their  nearest  neighbor.  If 
the  boundary  is  visible  then  it  "captures"  the  spots  and  their  matches  are  different.  Effects  like 
this  can  be  demonstrated  by  experiments  in  which  dot  stimuli  are  captured  by  surrounding  con¬ 
tours.  moving  periodic  gratings  or  other  dots  (Mackav.  1961;  Ramachandran  and  Anstis.  1983, b; 
Ramarhandran  and  Inada.  1985;  Williams.  Philip  and  Sekuler,  19X6).  These  experiments  show 
that  minimal  mapping  has  limitations  and  some  modifications  are  needed. 

The  main  reason  for  using  a  massively  parallel  network  is  the  reduction  in  computation 
time.  The  advantage  arises  because  many  problems  are  parallelizable,  and  with  such  a  network  we 
can  exploit  the  trade  ofi  between  the  number  of  elements  and  the  time  of  computation.  Currently, 
research  i.-  being  done  to  construct  electronical  devices  that  implement  such  networks.  This  massive 
parallelism  may  also  lead  to  fault  tolerance.  Networks  are  att  ractive  because  they  offer  a  method  of 


u 


V  V  V  a  v  V  >  * 


t  urning  a  problem  with  discrete  elements  into  one  with  continuous  ones,  thereby  making  it  possible 
to  solve  a  decision  problem  with  an  analog  machine.  Another  method  of  turning  a  discrete  problem 
into  a  continuous  one  has  been  described  by  Marroquin  (Marroquin,  1987). 

A  further  advantage  of  networks  of  this  type  is  t heir  possible  biological  plausibility 
This  argument,  however,  must  be  used  cautiously.  The  network  is  composed  of  simple  eleetr'nal 
components  that  could  simulate  the  dynamics  of  the  membrane  of  simple  neurons.  Moreover  there 
is  similarity  between  the  sigmoid  input-output  relations  of  the  network  elements  and  the  behavior 
of  the  synapses  of  neurons.  However  there  are  a  number  of  important  differences:  real  neurons  an 
very  complex  ( von  Neumann,  1958;  Kochetal.,  1982;  (’rill  and  Schwindt,  1983;  Kullleret  ah.  19s|i 
and  certainly  do  not  have  symmetric  synaptic  connections.  Moreover  the  brain  is  not  one  laige 
homogeneous  network  and  instead  has  many  different  levels  of  organization.  The  interconnect  ion' 
between  neurons  are  constrained  to  be  local,  although  well  defined  fiber  tracts  exist  for  long  dist.ou  •  ■ 
communication.  Therefore  networks  of  the  type  we  have  been  considering  can  only  model  local 
regions  of  the  brain. 

Our  networks  make  fast  decisions,  but  not  always  the  right  ones,  it  can  be  argued 
that  sometimes  it  is  more  important  to  obtain  fast  approximate  solutions  to  problems  rather  than 
slow  accurate  ones.  This  is  curiously  similar  to  the  arguments  of  Simon  in  decision  theory  (Simon. 
1979).  The  claim  being  that  a  decision  maker  should,  and  in  practice  does,  make  quick  approximate 
decisions  rather  than  being  perfectly  rational  and  finding  the  best  possible  decision  regardless  ut 
the  time  it  takes  to  compute  it. 
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